Skip to content

ENH: .shift optionally takes multiple periods #54115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Jul 28, 2023

Conversation

jona-sassenhagen
Copy link
Contributor

@jona-sassenhagen jona-sassenhagen commented Jul 13, 2023

Taking over from https://github.com/pandas-dev/pandas/pull/44660/files

@jona-sassenhagen jona-sassenhagen changed the title init ENH: .shift optionally takes multiple periods Jul 13, 2023
@jona-sassenhagen jona-sassenhagen marked this pull request as ready for review July 13, 2023 20:36
@jona-sassenhagen
Copy link
Contributor Author

jona-sassenhagen commented Jul 14, 2023

I think the pyarrow failure is unrelated? Otherwise green.

@jona-sassenhagen
Copy link
Contributor Author

Ok now it's all green.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking good! Some requests/suggestions below.

@mroeschke mroeschke added Groupby Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Jul 17, 2023
@@ -3002,7 +3002,7 @@ def autocorr(self, lag: int = 1) -> float:
>>> s.autocorr()
nan
"""
return self.corr(self.shift(lag))
return self.corr(cast(Series, self.shift(lag)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary because Series.shift can now return a dataframe if there are multiple periods. Mypy complains at this line because it wants Series and not Series | Dataframe, but because there is only one lag, it's always Series, so we can just cast.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add type-overloads to handle this. Sometimes they can get gnarly (see e.g. concat), but I think this one should be relatively straight forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to if that's ok? There is no Series.shift to overload so I feel like it would be confusing to newcomers like me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you insist I will try!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise I think this is done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rhshadrach , do you have time for another round/sign this off?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with opening an issue regarding this once it's merged for someone to followup on.

@jona-sassenhagen
Copy link
Contributor Author

All green!

@@ -3002,7 +3002,7 @@ def autocorr(self, lag: int = 1) -> float:
>>> s.autocorr()
nan
"""
return self.corr(self.shift(lag))
return self.corr(cast(Series, self.shift(lag)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with opening an issue regarding this once it's merged for someone to followup on.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just about there; one last request I think.

@jona-sassenhagen
Copy link
Contributor Author

jona-sassenhagen commented Jul 25, 2023

@rhshadrach @mroeschke I originally wanted this functionality to build simple Almon distributed lags models (or the corresponding thing from cognitive neuroscience), would there be a room for a Pandas docs example for this? Basically showing how an impulse response function can be recovered via regression of a multi-lagged feature.

E: maybe this PR isn't the space for this, I can also move to an issue

@jona-sassenhagen
Copy link
Contributor Author

Done - @rhshadrach everything ok from your end?

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach rhshadrach added this to the 2.1 milestone Jul 28, 2023
@rhshadrach rhshadrach added the Transformations e.g. cumsum, diff, rank label Jul 28, 2023
@rhshadrach rhshadrach merged commit 6a40f3b into pandas-dev:main Jul 28, 2023
@rhshadrach
Copy link
Member

Thanks @jona-sassenhagen! Great work!

@jona-sassenhagen jona-sassenhagen deleted the jona/44424_shift branch July 29, 2023 10:53
@jona-sassenhagen
Copy link
Contributor Author

Thanks @rhshadrach !

May I re-raise this: #54115 (comment)

@rhshadrach
Copy link
Member

May I re-raise this: #54115 (comment)

I recommend making an issue. There a few places that come to my mind:

A user guide example would need to be (a) self-contained and (b) not too long to be appropriate; the cookbook should be for generic operations (no domain context); and the ecosystem for packages. I'm not sure exactly what you're thinking of, but it sounds like it may not fit in any of these. But we can discuss further in an issue.

@jona-sassenhagen
Copy link
Contributor Author

Thanks, I will do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Groupby Transformations e.g. cumsum, diff, rank
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: pd.Series.shift and .diff to accept a collection of numbers
3 participants